Skip to content

feat(strands-py): add GoalLoop vended plugin with docs#2738

Merged
notowen333 merged 18 commits into
strands-agents:mainfrom
notowen333:python-goal-plugin-with-docs
Jun 15, 2026
Merged

feat(strands-py): add GoalLoop vended plugin with docs#2738
notowen333 merged 18 commits into
strands-agents:mainfrom
notowen333:python-goal-plugin-with-docs

Conversation

@notowen333

@notowen333 notowen333 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Description

Agents often need iterative refinement — retry until the response meets a quality bar. Today that means hand-rolling a loop with timeout logic, attempt tracking, feedback injection, and state management. GoalLoop encapsulates all of that as a vended plugin that works inside the existing hook lifecycle.

This PR ports the GoalLoop plugin from TypeScript to Python and adds a dual-language documentation page.

Public API Changes

New module: strands.vended_plugins.goal

from strands import Agent
from strands.vended_plugins.goal import GoalLoop

# Natural-language goal — judged by an internal agent built from the host's model
concise = GoalLoop(
    goal="At most 3 sentences, accessible to a 10-year-old, no jargon.",
    max_attempts=3,
)

agent = Agent(plugins=[concise])
agent("Explain how rainbows form.")
print(concise.last_result(agent))
# GoalResult(passed=True, stop_reason='satisfied', attempts=[...])
# Programmatic validator — pass a callable to skip the judge agent entirely
from strands.vended_plugins.goal import GoalLoop

def word_count_validator(response, agent):
    text = " ".join(
        block["text"] for block in response["content"] if "text" in block
    )
    words = len(text.split())
    if words <= 50:
        return True
    return {"passed": False, "feedback": f"Too long ({words} words). Cap at 50."}

plugin = GoalLoop(goal=word_count_validator, max_attempts=5, timeout=30.0)

Exported symbols

Symbol Kind Purpose
GoalLoop Plugin class Main entry point — attach to an agent via plugins=[...]
GoalResult Dataclass Aggregate result with passed, stop_reason, attempts
GoalAttempt Dataclass Per-attempt record: attempt, passed, feedback
GoalStopReason Literal type "satisfied" | "max_attempts" | "timeout"
JudgeConfig Dataclass Optional judge tuning: model, system_prompt
ValidationOutcome Dataclass Canonical validator return: passed, feedback
Validator Protocol Type for programmatic validator callables
JUDGE_SYSTEM_PROMPT str Default system prompt for the NL judge
JudgeOutcome Pydantic model Structured output schema the judge fills
build_judge_prompt Function Builds the judge input from a goal + transcript

GoalLoop constructor parameters

Parameter Default Description
goal (required) NL string (judged by internal agent) or callable validator
max_attempts inf Maximum attempts before stopping
timeout inf Wall-clock budget in seconds
judge None JudgeConfig to override the judge model or system prompt
preserve_context True Keep conversation history across retries
resume_prompt_template (built-in) `Callable[[str
name "strands:goal-loop" Plugin name (must be unique per agent)

Related Issues

N/A - new feature port

Documentation PR

Included in this PR under site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx with dual-language tabs (Python + TypeScript).

Type of Change

New feature

Testing

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@github-actions github-actions Bot added enhancement New feature or request python Pull requests that update python code area-hooks Features or requests that might be implementable via hooks area-structured-output Related to the structured output api documentation Documentation changes, improvements, additions, content updates, site improvements, examples, guides size/xl and removed size/xl labels Jun 11, 2026
@github-actions github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026
@github-actions github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026
@github-actions github-actions Bot added size/xl and removed size/xl labels Jun 11, 2026
@github-actions github-actions Bot removed the size/xl label Jun 11, 2026
@agent-of-mkmeral

Copy link
Copy Markdown
Contributor

Re-check at head 7d855d0 (covers 930c808 + 7d855d0)

Verdict: the port is still faithful, and all four items from my original review are resolved. Re-verified the changed surface against the TS source and ran the suite — 41/41 unit tests pass locally.

✅ The one real fidelity gap is fixed (Ralph-mode system_prompt rewind)

run.initial_snapshot = event.agent.take_snapshot(
    preset="session", include=["system_prompt"], exclude=["state"]
)

Verified empirically against the snapshot resolver:

before fix: ['conversation_manager_state', 'interrupt_state', 'messages', 'model_state']
after fix:  ['conversation_manager_state', 'interrupt_state', 'messages', 'model_state', 'system_prompt']

Python now rewinds everything the TS session preset rewinds (plus conversation_manager_state, which is documented inline as an intentional Python-only divergence — exactly what I'd hoped for). The test at test_snapshot_taken_on_first_model_call pins the exact call signature, so this can't silently regress. And the docs claim in goal-loop.mdx ("messages, system prompt, model state") is now accurate for Python — no doc change needed.

✅ Minor items from my review, all addressed

  • Judge-path unit tests — the previously untested _judge_validator now has 6 mocked tests (test_nl_judge_*: first-attempt pass, feedback loop, judge.model override, judge.system_prompt override, no-structured-output fallback, fresh-agent-per-validation), matching the TS suite's coverage 1:1. The patch target (strands.agent.agent.Agent) correctly intercepts the plugin's lazy import.
  • GoalStopReason — now Literal["satisfied", "max_attempts", "timeout"].
  • import inspect — moved to module level.
  • Bonus: judge.py rendering helpers got real types (Message/ContentBlock/ToolResultContent instead of dict) — behavior unchanged, port fidelity unaffected.

Port fidelity of the new changes themselves

The Validator@runtime_checkable Protocol conversion preserves the exact call shape ((response, agent) positionally, return bool | dict | ValidationOutcome, sync or async), so all TS-equivalent behavior in _fn_validator is untouched. The structured-logging reformat changes only the log string, not semantics.

Remaining (non-blocking)

  • resume_prompt_template is still a frozen Callable — same forward-compat argument as Validator, worth doing while the API is new (the /strands review bot flagged this too).
  • CI note: all Python unit-test jobs are green across 3.10–3.14 / linux+windows (the CANCELLED entries are superseded runs); the label-size check failure is labeler noise unrelated to this change.

Good to go from a port-fidelity standpoint. 🚢

Port the GoalLoop iterative-refinement plugin from TypeScript to Python.
The plugin validates agent responses against a goal (NL string or
programmatic validator) and loops with feedback until satisfied.

Includes 35 unit tests and a dual-language documentation page.
TypeScript examples must live in sibling .ts files and be included
via --8<-- directives, not inlined in MDX. Created goal-loop.ts and
goal-loop_imports.ts with proper snippet regions.

Updated docs-writer skill to make this convention unmissable:
added CRITICAL callout in Step 3b and a new top-level Gotcha.
All three skills (writer, reviewer, audit) now enforce:
- TypeScript is never inlined in MDX
- Imports live in a separate _imports.ts file with per-example regions
- Every TS example must include both imports and body snippets
- A body-only include missing imports fails review
Replace ASCII box-drawing diagram with a mermaid flowchart.
Add mermaid requirement to docs-writer and docs-reviewer skills.
Both were plain facts that belong as inline prose, not visually
loud admonitions. The caution described behavior the plugin
already warns about; the note was just context.
The callout sparing-use rule already lives in mdx-authoring.md.
Remove redundant restatements from writer/reviewer skills and
instead point to the existing guidance at the right moments.
- Reflow goal-loop.mdx prose to fill lines to ~80-90 chars
- Remove language-specific param from heading ("Stateless Retries")
- Restructure reviewer skill: split monolithic Constraints bullet
  into separate dimensions (Voice Stack, Multi-Language, Terminology,
  Code Examples with site conventions, Readability, Type Alignment)
- Add heading language-neutrality rule to writer Step 4
Prose outside tabs must be language-neutral. Replaced Python-specific
parameter names (preserve_context=False, max_attempts, stop_reason,
last_result()) with plain English equivalents. Updated writer skill
to make language-neutral shared prose the top-level rule.
…ator

- Decompose build_judge_prompt into small named helpers instead of nested loops
- Rename _nl_validator to _judge_validator for clarity
- Add integration tests mirroring the TS integ suite (standard loop + preserve_context=False)
- Type WeakSet/WeakKeyDictionary with Agent instead of Any
- Reject timeout <= 0 (was allowing 0 which causes immediate timeout)
- Use unicode ellipsis in truncation to match TS output format
- Include system_prompt in Ralph-mode snapshot (restores TS parity)
- Use Literal type for GoalStopReason instead of bare str
- Move `import inspect` to module level
- Fix mypy: type judge helpers with Message/ContentBlock/ToolResultContent
- Add NL judge unit tests (construction, feedback, model/prompt overrides,
  fallback path, fresh-agent-per-validation)
- Remove unused pytest import from integ tests
…ment

- Convert Validator from Callable alias to Protocol with **kwargs for
  forward-compatible extensibility
- Fix logger.warning to use structured format (plugin=<%s>, error=<%s> |)
- Use full GoalResult/GoalAttempt equality assertions instead of per-field
- Add comment explaining Python snapshot preset divergence from TS
Replace 'plugins' (not in the collection schema) with 'event-loop'.
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Documentation Preview Ready

Your documentation preview has been successfully deployed!

Changed pages:

Updated at: 2026-06-15T15:54:21.146Z

Comment thread .agents/skills/docs-reviewer/SKILL.md
Comment thread site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx Outdated
Comment thread .agents/skills/docs-audit/SKILL.md
Comment thread strands-py/src/strands/vended_plugins/goal/judge.py Outdated
Comment thread strands-py/src/strands/vended_plugins/goal/judge.py Outdated
Comment thread site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx Outdated
Comment thread site/src/content/docs/user-guide/concepts/plugins/goal-loop.mdx Outdated
Comment thread site/src/content/docs/user-guide/concepts/plugins/goal-loop.ts Outdated
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py
- Remove hook/event implementation details from "How It Works" section
- Convert JUDGE_SYSTEM_PROMPT to triple-quoted string for readability
- Extract TS word_count_validator into a named function (matches Python)
- Normalize variable naming to `plugin` in "Inspecting Results" examples
- Replace Spanish resume prompt examples with English
- Use <Syntax> component for language-specific inline terms
- Add prompt-authoring tag to link goal-loop with steering
- Add explanatory comments for WeakKeyDictionary and WeakSet usage
The "start over from scratch" prompt shows a real reason to customize —
diverging from the default incremental-fix behavior — rather than
restating the default in slightly different words.
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py
Comment thread strands-py/src/strands/vended_plugins/goal/plugin.py
@github-actions

Copy link
Copy Markdown
Contributor

Assessment: Comment (one merge-gate fix before merge)

Clean, well-documented port with strong test coverage. I verified the suite locally — 41/41 unit tests pass, mypy is clean, and ruff check (lint) is clean. The prior review history (port fidelity, Validator Protocol, structured logging, full-object assertions, snapshot-preset docs) is all addressed, so I focused only on what's new.

What I found
  • Formatting gate (worth fixing before merge): ruff format --check fails on plugin.py — see the inline comment. CI's ci.yml runs this, so it'll block merge despite the hatch run prepare checkbox. One ruff format run clears it.
  • Timeout semantics (suggestion): timeout is checked before validation, so a final response that would pass is reported as timeout/not-passed. Inline note suggests validating first or documenting the choice.

Only the formatter is blocking; everything else is non-blocking. Nice work — the docs page and judge/validator separation read really well.

zastrowm
zastrowm previously approved these changes Jun 13, 2026
…mple

Explicitly type the validator return as Promise<ValidationOutcome> and cast
structuredOutput to satisfy the Validator type constraint.
@github-actions

Copy link
Copy Markdown
Contributor

Re-review — head 3d6e13a

Assessment: Comment (one merge-gate fix still outstanding)

The new commit since my last pass is docs-only (the TS custom_judge typecheck fix) and it checks out — Promise<ValidationOutcome> + the as ValidationOutcome cast with a ?? {passed:false,...} fallback correctly mirror the Python judge path.

Still blocking: my earlier inline comment about ruff format --check failing on plugin.py is unaddressed — the docs commit didn't touch it. Re-confirmed on this head:

$ ruff format --check src/strands/vended_plugins/goal/plugin.py
Would reformat: src/strands/vended_plugins/goal/plugin.py

It's the same three spots (blank line before Validator L79 and GoalStopReason L97; the __call__ signature collapses to one line). ci.yml runs this check, so it'll block merge. One ruff format run clears it.

Everything else is green on this head — 41/41 unit tests, ruff check, and mypy all pass. The non-blocking timeout-semantics suggestion is already covered by the docs' Limitations section, so no action needed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-hooks Features or requests that might be implementable via hooks area-structured-output Related to the structured output api documentation Documentation changes, improvements, additions, content updates, site improvements, examples, guides enhancement New feature or request python Pull requests that update python code size/xl

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants